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METHOD OF IDENTIFYING HAIRPIN DNA 
PROBES BY PARTIAL. FOLD ANALYSIS 



BACKGROUND OF THE INVENTION 

The use of DNA hairpins as molecular, beacons, both in solution (Broude, 
Trends BiotechnoL 20:249-256 (2002); Dubertret et al.,Mzf. BiotehcnoL 19:365-370 (2001)) 
and immobilized on a solid surface (Fang et al., J. Am. Chem. Soc. 121:2921-2922 (1999); 
Wang et al., Nucl Acids. Res. 30:e61 (2002); Du et al., J. Am. Chem. Soc. 125:4012-^013 
(2003)), has proven to be an excellent method for "label-free" detection (Chan et al., J. Am. 
Chem. Soc. 123:11797-11798 (2001)) of biological entities. This disclosure describes a new 
method of molecular beacon discovery which relies on the generation of naturally occurring 
hairpins. The method of discovery and its advantages shall be discussed herein. 

The traditional method of molecular beacon generation is to supplement a 
naturally occurring DNA sequence at both the 5* and 3' ends with the necessary nucleotide 
composition to force the formation of a hairpin. This technique has a major flaw in that the 
introduction of nucleotides that are not specific for the intended target sequence increases the 
likelihood of non-specific binding. The use of naturally occurring DNA hairpins obviates 
this flaw by eliminating the need for supplementation of additional bases, the result: a probe 
that is completely specific for its designed target. 

DESCRIPTION OF THE INVENTION 

The method of the invention involves obtaining or providing a nucleotide 
sequence from a molecular target. The nucleotide sequence can be sequenced from an 
isolated cDNA or obtained from an online database such as GenBank. Regardless of the 
source of the nucleotide sequence, a partial fold analysis is performed on the nucleotide 
sequence using any of a variety of suitable folding software such as, e.g., RNAStructure 
program (available from D. Turner at the University of Rochester, Rochester, NY), Mfold 
software package (available from M. Zucker at the Rensselear Polytechnic Institute, 
Rensselear, NY), and Vienna RNA software package, including RNAfold, RNAeval, and 
RNAsubopt (available from I. Hofacker at the Institute for Theoretical Chemistry, Vienna 
Austria). The resulting folded structure may or may not be the true active conformation of 
the RNA molecule in a cellular environment; however, it represents the lowest free energy 
state as predicted using such software. It is believed that more often than not, the predicted 
lowest free energy state of the nucleic acid molecule sufficiently resembles the true active 
conformation. Nonetheless, the resulting folded structure is analyzed to identify hairpin 
regions thereof. 

Having identified hairpin structures within the folded structure of the 
prospective target nucleic acid molecule, the hairpin sequences are isolated from the larger 
sequence (i.e., that was used as input to the folding software). The isolation can be 
performed in silico. Once isolated, the hairpin sequence is subjected to a second structural 
prediction as was performed on the prospective target nucleic acid molecule. 

The overall length of the selected hairpin is preferably between about 12 and 
about 60 nucleotides, more preferably between about 20 and about 50 nucleotides, most 
preferably between about 30 and about 40 nucleotides. It should be appreciated, however, 
that longer or shorter nucleic acids can certainly be used. According to the preferred 
hairpins, the regions forming the stem of the hairpin are preferably at least about 4 
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nucleotides in length and up to about 28 nucleotides in length, depending on the overall 
length of the nucleic acid probe and the size of a loop region present between the portions 
forming the stem. It is believed that a loop region of at least about 4 or 5 nucleotides is need 
to form a stable hairpin. The regions forming the stem can be perfectly matched (i.e., having 
100 percent complementary sequences that form a perfect stem structure of the hairpin 
conformation) or less than perfectly matched (i.e., having non-complementary portions that 
form bulges within a non-perfect stem structure of the hairpin conformation). When the first 
and second regions are not perfectly matched, the regions forming the stem structure can be 
the same length or they can be different in length. 

Importantly, applicants have found that the predicted E value for the hairpin 
should preferably be at most about -3 kcal/mol, more preferably at most about -3.5 kcal/mol, 
most preferably between about -4 kcal/mol and about -12 kcal/mol. It should be appreciated, 
however, that identified hairpins can still function as molecular probes if their predicted E 
value falls outside these ranges. 

Once the structure of the hairpin itself has been predicted, the duplex formed 
between the hairpin and its complement is subjected to a structural prediction as was 
performed on the prospective target nucleic acid molecule and the hairpin. This step, not 
necessary for identification of the hairpin per se 9 is performed primarily to ensure that the 
hybridization of the two sequences (hairpin and complement), and thus the disruption of the 
hairpin, will be an energetically favorable process. Ideally, there should be an increase in the 
predicted E value, preferably at least about a two-fold increase, preferably at least about a 
five-fold increase, more preferably at least about a ten-fold increase. This structural 
prediction also serves to demonstrate the primary advantage of the technique: after 
hybridization, there are no extraneous unhybridized nucleotides and, thus, lowered risk of 
non-specific binding. 

To further verify the specificity of the hairpin sequence for its complement, 
the hairpin sequence can be used to perform a BLAST database search (of, e.g., the GenBank 
database). Ideally, the resulting BLAST search will show not only high match scores for 
molecular targets (or target organisms), but also a sharp discrepancy (or clear demarcation) 
between the high match scores of the target and any match scores of nucleic acid molecules 
bearing lower similarity. By sharp discrepancy and clear demarcation, it is intended that a 
gap of at least about 5 points, preferably at least about 10 points, more preferably at least 
about 15 points, most preferably at least about 20 points, exists between the target and non- 
target sequences. This is exemplified in Example 1 below. 

The probes identified in accordance with the present invention cSan be used in 
any of a variety of hybridization-based applications, typically though not exclusively 
detection procedures for identifying the presence in a sample of a target nucleic acid 
molecule. By way of example, uses of the probes are described in greater detail in U.S. 
Utility Patent Application to Miller, et al., entitled "Hybridization-Based Biosensor 
Containing Hairpin Probes and Use Thereof," filed concurrently with this application and 
expressly incorporated by reference in its entirety. 

Example 1 - Hairpins Targeted to Bacillus anthracis pag Gene 

A partial gene sequence of the Bacillus anthracis Pag gene (isolate IT — Carb3 
- 6254) (Adone et al., J. AppL Microbiol. 92:1-5 (2002), which is hereby incorporated by 
reference in its entirety) was obtained from GenBank. The secondary structure of -1000 
nucleotide fragments of the aforementioned sequence were then computationally predicted 
(RNAstructure v. 3.7: Mathews et al., J: Mol. Biol 288:91 1-940 (1999), which is hereby 
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incorporated by reference in its entirety). Ideally, the secondary structure of the entire 
sequence would be predicted, but it was discovered repeatedly that segments larger than 
approximately 1000 bases would crash the program RNAstructure v. 3.7. 

An example of a large sequence structure prediction is shown in Figure 1 

(below). 



Figure 1. Secondary structure prediction of B. anthrac/s Pag gene 541 - 1560. 

As is evidenced by Figure 1, the "folding** of large sequences of DNA reveals 
several naturally occurring hairpins. The sequences are then isolated from the full sequence 
and subjected to second structure prediction. Figures 2 and 3 show structural predictions for 
two of these excised sequences. 
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These natural hairpins both appear to be good candidates for .use as a 
molecular beacon, because each contains between about 30 to about 40 nucleotides long and 
each has a E^ict between about -4 kcal/mol and about -12 kcal/mol. 

Having confirmed that the selected hairpin(s) satisfy initial selected criteria, a 
final structural prediction of the sequence in duplex with its complement was computed 
(Figures 4 and 5). This last prediction was done primarily to ensure that the hybridization of 
the two DNA sequences, and thus the disruption of the hairpin will be an energetically 
favorable process. Each of these duplexes have a predicted E value that is about nine to ten- 
fold greater than the predicted E value for the hairpin alone, and therefore they are expected 
to favorably form a duplex with their targets. 
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Figure 4. Pag 668 - 706 duplex 
Epre«jict s kcal/mol 



Figure 5. Pag 1209 - 1241 duplex 
E predict = -42.6 kcal/mol 



The specificity of the hairpin of Figure 2 for its target was supported by a 
BLAST search of the GenBank database using the Pag 668-704 sequence. The results of this 
BLAST search are shown below in Figure 6 below. In particular, the BLAST results indicate 
that only sequences from Bacillus anthracis, the target organism, have high scores; whereas 
other hatching sequences from non-target organisms have significantly lower scores. In 
this instance, a clear demarcation exists between target scores (of 78) and non-target scores 
(of 42 and lower). This demonstrates that this hairpin will be specific for its target 
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Example 2 - Hairpins Targeted to Staphylococcus aureus Genome 



Two DNA hairpins, AH2 and BH2 were designed to incorporate portions of 
the Staphylococcus aureus genome (Genbank Accession AP003 131, which is hereby 
incorporated by reference in its entirety). The AH2 sequence appears to target an intergenic 
region between ORFID:SA0529 and ORFID:SA0530, and the BH2 sequence appears to 
target an intergenic region between ORFID:SA0529 and ORFID:SA0530 but including 
several bases within the latter open reading frame. 

A segment of the complete Staphylococcus aureus genome was obtained from 
the GenBank database and the secondary structure of the obtained segment was predicted 
using computer program RNAStructure version 3.7 (Mathews et ah, J. Mol. Biol. 288:91 1- 
940 (1999), which is hereby incorporated by reference in its entirety). From this predicted 
structure, two naturally occurring hairpins were identified, one corresponding to AH2 and the 
other corresponding to BH2. 

Having identified these two sequences, these sequences were isolated from the 
larger sequence and subjected to a second structure prediction as described above. The 
predicted structure of AH2 is characterized by a predicted free energy value of about -6.1 
kcal/mol and the predicted structure of BH2 is characterized by a predicted free energy value 
of about -3.5 kcal/mol. Both are within the size range of about 30-40 nucleotides. 
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AH2 BH2 

(E = -6.1) (E = -3.5 kcal/mol) 
nt = 33 nt-37 

Having selected AH2 and BH2, a final structural prediction of the duplexes 
(AH2 and BH2 with their respective complements) was carried out to determine their 
predicted E value. The duplex containing AH2 was predicted to have a free energy value of 
-38.3 kcal/mol and the duplex containing BH2 was predicted to have a free energy value of 
-39.0 kcal/mol. These values indicate that the hybridization between the hairpin and its target 
will be an energetically favorable process. A BLAST search was independently performed 
using the AH2 and BH2 sequences, the results indicating that only segments of the 
Staphylococcus aureus genome contain highly related nucleotide sequences. 

This process described above and exemplified in Examples 1-2 has also been 
performed using Exophiala dermatitidis 1 8S ribosomal RNA gene sequences to identify 
hairpin probes that can be used to identify the target gene (and organism); trichophyton 
tonsurans strain 18S ribosomal RNA gene sequences to identify hairpin probes that can be 
used to identify the target gene (and organism); and Bacillus cereus genomic DNA to identify 
hairpin probes that can be used to identify the target DNA (and organism). 

Although preferred embodiments have been depicted and described in detail t 
herein, it will be apparent to those skilled in the relevant art that various modifications, 
additions, substitutions, and the like can be made without departing from the spirit of the 
invention and these are therefore considered to be within the scope of the invention as 
defined in the claims which follow. 
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What is Claimed: 
~1. A method of identifying hairpin nucleic acid probes, the method 

comprising: 

providing a target nucleic acid sequence that is larger than about 100 

nucleotides in length; 

predicting a folded structure of the target nucleic acid sequence; 

identifying the nucleotide sequence of a hairpin within the folded 
structure of the target nucleic acid sequence; and 

predicting a folded structure of the nucleotide sequence of hairpin, in 
the absence of other nucleotides of the target nucleic acid sequence, wherein the folded 
structure of the hairpin has a predicted E value of at most about — 3 kcal/mol. . 

2. The method according to claim 1 wherein the nucleotide sequence of 
the hairpin is between about 12 and about 60 nucleotides in length. 

3. The method according to claim 1 wherein the folded structure of the 
hairpin has a predicted E value of between about - 4 kcal/mol and about — 12 kcal/mol. 

4. The method according to claim 1 further comprising: 

predicting a folded structure of a duplex formed between the hairpin 

and its complement. 

5. The method according to claim 4 further comprising: 
determining whether duplex formation is energetically favorable. 

6. The method according to claim 1 further comprising: 

performing a database search for nucleotide sequences that are similar 
to the identified nucleotide sequence of the hairpin. 

7. The method according to claim 6 further comprising: 
determining, from the results of the performed database search, 

whether a clear demarcation exists between scores for target nucleic acid sequences and 
scores for non-target nucleic acid sequences. 

8. The method according to any one of claims 1-7 further comprising: 
synthesizing a nucleic acid molecule corresponding to the nucleotide 

sequence of the hairpin. 

9. An isolated nucleic acid molecule prepared according to the process of 

claim 8. 
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